Textual Expertise In Word Experts: An Approach To Text Parsing Based On Topic/Comment Monitoring

نویسنده

  • Udo Hahn
چکیده

In this paper prototype versions of two word experts for text analysis are dealt with which demonstrate that word experts are a feasible tool for parsing texts on the level of text cohesion as well as text coherence. The analysis is based on two major knowledge sources: context information is modelled in terms of a frame knowledge base, while the co-text keeps record of the linear sequencing of text analysis. The result of text parsing consists of a text graph reflecting the thematic organization of topics in a text. i. Word Experts as a Text Parsing Device This paper outlines an operational representation of the notion of text cohesion and text coherence based on a collection of word experts as central procedural components of a distributed lexical grammar. By text cohesion, we refer to the micro level of textuallty as provided, e.g. by reference, substitution, ellipsis, conjunction and lexical cohesion (cf. HALLIDAY/HASAN 1976), whereas text coherence relates to the macro level of textuality as induced, e.g. by patterns of semantic recurrence of topics (thematic progression) of a text (cf. DANES 1974). On a deeper level of propositional analysis of texts further types of semantic development of a text can be examined, e.g. coherence relations, such as contrast, generalization, explanation (cf. HOBBS 1979, HOBBS 1982, DIJK 1980a), basic modes of topic development, such as expansion, shift, or splitting (cf. GRIMES 1978), and operations on different levels of textual macro-structures (DIJK 1980a) or schematlzed superstructures (DIJK 1980b). The identification of cohesive parts of a text is needed to determine the continuous development and increment of information with regard to single thematic focl, i.e. topics of the text. As we have topic elaborations, shifts, breaks, etc. in texts the extension of topics has to be delimited exactly and different topics have to be related properly. The identification of coherent parts of a text serves this purpose, in that the determination of the coherence relations mentioned above * Work reported in this paper is supported by BMFT/GID under grant no. PT 200.08. contributes to the delimitation of topics and their organization in terms of text grammatical well-formedness considerations. Text graphs are used as the resulting structure of text parsing and serve to represent corresponding relatlons holding between different topics. The word experts outlined below are part of a genuine text-based parsing formalism incorporating a llnguistical level in terms of a distributed text grammar and a computational level in terms of a corresponding text parser (HAHN/REIMER 1983; for an account of the original conception of word expert parsing, cf. SMALL/RIgGER 1982). This paper is intended to provide an empirical assessment of word experts for the purpose of text parsing. We thus arrive at a predominantly functional description of this parsing device neglecting to a large extent its procedural aspects. The word expert parser is currently being implemented as a major system component of TOPIC, a knowledge-based text analysis system which is intended to provide text summarization (abstracting) facilities on varlable layers of informational speclfity for German language texts (each approx. 2000-4000 words) dealing with information technology. Word expert construction and modification is supported by a word expert editor using a special word expert representation language fragments of which are introduced in this paper (for a more detailed account, cf. HAHN/REIMER 1983, HAHN 1984). Word experts are executed by interpretation of their representation language description. TOPIC's word expert system and its editor are written in the C programming language and are running under UNIX. 2. Some General Remarks about Word Expert Strutture and the Knowledge Sources Available for Text Parsin~ A word expert is a procedural agent incorporating linguistic and world knowledge about a particular word. This knowledge is represented declaratlvely in terms of a decision net whose nodes are constructed of various conditions. Word experts communicate among each other as well as with other system components in order to elaborate a word's meaning (reading). The conditions at least are tested for two kinds of knowledge sources, the context and the co-text of the corresponding word.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents

Text Classification is an important research field in information retrieval and text mining. The main task in text classification is to assign text documents in predefined categories based on documents’ contents and labeled-training samples. Since word detection is a difficult and time consuming task in Persian language, Bayesian text classifier is an appropriate approach to deal with different...

متن کامل

Using Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents

Text Classification is an important research field in information retrieval and text mining. The main task in text classification is to assign text documents in predefined categories based on documents’ contents and labeled-training samples. Since word detection is a difficult and time consuming task in Persian language, Bayesian text classifier is an appropriate approach to deal with different...

متن کامل

EXTRACTION-BASED TEXT SUMMARIZATION USING FUZZY ANALYSIS

Due to the explosive growth of the world-wide web, automatictext summarization has become an essential tool for web users. In this paperwe present a novel approach for creating text summaries. Using fuzzy logicand word-net, our model extracts the most relevant sentences from an originaldocument. The approach utilizes fuzzy measures and inference on theextracted textual information from the docu...

متن کامل

A New Document Embedding Method for News Classification

Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...

متن کامل

Topic Essentials

An overview of TOPIC is provided, a knowledge-based text information system for the analysis of Germanlanguage texts. TOPIC supplies text condensates (summaries) on variable degrees of generality and makes available facts acquired from the texts. The presentation focuses on the major methodological principles underlying the design of TOPIC: a frame representation model that incorporates various...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1984